Influence of Text Type and Text Length on Anaphoric Annotation
نویسندگان
چکیده
We report the results of a study that investigates the agreement of anaphoric annotations. The study focuses on the influence of the factors text length and text type on a corpus of scientific articles and newspaper texts. In order to measure inter-annotator agreement we compare existing approaches and we propose to measure each step of the annotation process separately instead of measuring the resulting anaphoric relations only. A total amount of 3642 anaphoric relations has been annotated for a corpus of 53038 tokens (12327 markables). The results of the study show that text type has more influence on inter-annotator agreement than text length. Furthermore, the definition of well-defined annotation instructions and coder training is a crucial point in order to receive good annotation results.
منابع مشابه
Web-based Annotation of Anaphoric Relations and Lexical Chains
Annotating large text corpora is a timeconsuming effort. Although single-user annotation tools are available, web-based annotation applications allow for distributed annotation and file access from different locations. In this paper we present the webbased annotation application Serengeti for annotating anaphoric relations which will be extended for the annotation of lexical chains.
متن کاملAnnotation of anaphoric relations in biomedical full-text articles using a domain-relevant scheme
Biomedical literature has been the focus of relevant information extraction projects, however there is no corpus of full scientific articles annotated with anaphoric links for training and evaluation of anaphora resolution systems—which are an important part of information extraction efforts—for this domain. We have created a corpus of biomedical articles that are annotated with anaphoric links...
متن کاملCorpus Annotation And Reference Resolution
A variety of approaches to annotating reference in corpora have been adopted. This paper reviews four approaches to the annotation of reference in corpora. Following this we present a variety of results from one annotated corpus, the UCREL anaphoric treebank, relevant to automated reference resolution.
متن کاملWikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles
This paper presents WikiCoref, an English corpus annotated for anaphoric relations, where all documents are from the English version of Wikipedia. Our annotation scheme follows the one of OntoNotes with a few disparities. We annotated each markable with coreference type, mention type and the equivalent Freebase topic. Since most similar annotation efforts concentrate on very specific types of w...
متن کاملArabic anaphora resolution: corpora annotation with coreferential links
Annotated resources are much needed for evaluation and training of anaphora resolution systems. The coreferential chain annotation is a difficult task which can not be realised without an appropriate tool. In this paper, we present our work on Arabic corpora annotation with anaphoric links (i.e., the annotation of the identity relation between the anaphors and their antecedents). In particular,...
متن کامل